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^ ' Abstract 

^. 

We outline some of Chris Wallace's contributions to pseudo-random 

CO ' number generation. In particular, we consider his recent idea for gen- 

erating normally distributed variates without relying on a source of 
lO ' uniform random numbers, and compare it with more conventional 

^^ \ methods for generating normal random numbers. Implementations 

of Wallace's idea can be very fast (approximately as fast as good uni- 
form generators). We discuss the statistical quality of the output, and 
mention how certain pitfalls can be avoided. 



c^ ' 1 Introduction 



In many simulation, graphics, simulated-annealing, cryptographic and Monte 
Carlo/Las Vegas programs, a substantial fraction of the time is used in gen- 
erating pseudo-random numbers from the uniform, normal or other distribu- 
tions, so methods of generating such numbers have received much attention. 



*This paper was originally dedicated to Professor Chris Wallace [J^ on the occasion 
of his 70th birthday, but unfortunately Chris passed away before it could be published. 
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This paper is dedicated to the memory of Chris Wallace, and our in- 
tention is to outline Wallace's substantial contribution to several aspects of 
random number generation, both in hardware and software. 

In ^we consider hardware random number generators (RNGs), and in 
§H]we mention (software) uniform RNGs. In ^we consider "conventional" 
normal random number generators, and in ^ we consider Wallace's new 
"maximum entropy" idea for normal RNGs that do not depend in an essen- 
tial way on a source of uniform RNGs. This idea is aesthetically appealing 
(why bother to generate uniform random numbers just in order to transform 
them by some time-consuming process into normal random numbers?) and 
has the potential to give extremely fast normal RNGs. 

2 Hardware RNGs 

In some cryptographic applications it is important for the numbers to be gen- 
uinely random, in the sense of being unpredictable, and not merely "pseudo- 
random", in the sense of passing various statistical tests. For example, this 
is the case when generating "one-time pads", or when constructing ran- 
dom primes whose products are to be made public for use with the Rivest, 
Shamir and Adleman "RSA" public- key cryptosystem [34], or when con- 
structing exponents to be used in the Diffie-Hellman key exchange protocol 
or the El Gamal public- key cryptosystem |27j . 

Wallace in [37] described a simple hardware device that could provide 
a stream of unpredictable 32-bit numbers at a rate of 64 Mbit/sec, using a 
4 MHz clock. The device was connected to the memory-mapped I/O bus 
of a multiprocessor computer, and appeared to a software process as a sin- 
gle 32-bit word of memory whose content was different (and unpredictable) 
every time it was read. Technology has advanced since 1990 so an implemen- 
tation using similar ideas could now use a much faster clock. This could, for 
example, be used in the implementation of Rabin's "everlasting encryption" 
scheme [U O [26l [32] , which depends on the availability of a high- volume 
stream of random and unpredictable bits. 

3 Uniform RNGs 

In [36] , Wallace considered several ways of obtaining uniform pseudo-random 
number generators with period close to 2®^ on machines with 32-bit words. 
There are many ways to accomplish this [71 [I3l [211 ESI [25] , but most of them 
require a large amount of state information. This can be a problem if several 



independent streams of random numbers are required simultaneously. With 
Wallace's proposal, only two 32-bit words of state information are required. 

4 Conventional Normal RNGs 

Most popular algorithms for generating normally distributed pseudo-random 
numbers are based on some variant of the rejection method, pioneered by von 
Neumann [2S]. More recent references are [H El [121 [12 [201 E2] . Wallace [33] 
contributed some elegant and efficient generators of this class. 

Rejection methods for normally distributed pseudo-random numbers re- 
quire on average some number [/ > 1 of uniformly distributed numbers per 
normally distributed number. Thus, they can not be faster than the uniform 
random number generator, and are typically several times slower. Rejection 
methods for the normal distribution usually (though not always [HI 114] ) 
involve the computation of functions such as log, sin, cos, which is slow 
compared to the time required to generate a uniform pseudo-random num- 
ber. Leva [22l Table 1] compared several of the better methods and found 
that they are at least five times slower than a fast uniform generator on the 
same machine. 

5 Maximum Entropy Normal RNGs 

Wallace [38] revolutionised normal random number generation by his discov- 
ery of a class of methods that do not depend in an essential way on uniform 
generators. Similar ideas can be used to generate pseudo-random numbers 
with some other distributions. In Wallace's paper |38j the uniform, Gaussian 
(normal) and exponential distributions are considered as maximum-entropy 
distributions subject to the following constraints: 

Uniform: < x < 1 

Gaussian: E(x^) = 1 

Exponential: E(x) = 1, x > 0. 

The idea of a maximum-entropy distribution is most easily seen in the 
discrete case of A'" possibilities with probabilities pi, ■ ■ ■ ,pn- Subject to the 
constraints pj > and YlPj = 1' ^^^ uniform distribution pi = • • • = Pn = 
1/A^ maximises the entropy S = —^Pj \ogpj. This can be proved using La- 
grange multipliers. Similarly, the continuous distribution on [0, 1] that max- 
imises — /g f (x) log f{x)dx is the uniform distribution, and the continuous 



distribution on (— c«,+oo) that maximises — J_^ f{x)log f{x)dx subject 

to J_°° x'^f{x)dx = 1 is the Gaussian distribution. These statements can 
be proved using the calculus of variations. For the reader unfamiliar with 
Bayesian and maximum entropy methods, a good introduction is Jaynes [17] . 
An annotated bibliography is available at |18j . 

In the following we restrict our attention to the Gaussian case, since that 
is where Wallace's idea gives the most significant speedup over conventional 
methods. For example, Wallace's own implementation FastNorm is reported 
in |38t §5] to be only 13 percent slower than a generalised Fibonacci uniform 
random number generator on a RISC workstation. 

Wallace proposed his method in a Technical Report in 1994, and a revi- 
sion of this Report appeared two years later [38] along with an implemen- 
tation fastnorm. Some changes in the implementation were made in 1998, 
resulting in an improved implementation fastnorm2 [39]. There is a more 
recent and probably better implementation fastnormS [H], but it was not 
available when our tests were performed, so we restrict our comments to 
fastnorm2. 

Wallace [38] describes two implementations - one using integer arith- 
metic, and the other using floating-point arithmetic. On the workstation 
that he tested it on, the integer version was faster, but this might not be 
true on more recent machines with faster floating-point hardware. 

Traditional normal RNGs are inefficient on vector processors. In 1993 the 
author compared various normal RNGs on vector processors and concluded 
that careful implementations of old methods such as the 1958 Polar method 
of Box, Muller and Marsaglia (see Knuth |21^ Algorithm P]) and the 1959 
Box-Muller method [2H [28] were faster than more recent methods [22] on 
vector processors produced by companies such as Cray, Fujitsu, and NEC: 
see [8l|30]. When Wallace's maximum-entropy idea appeared, it was clear 
that the landscape had changed, although the published implementation 
fastnorm was not intended to be efficient on vector processors. Thus, the 
author implemented an efficient vectorised version rann4 O [10] of Wallace's 
maximum-entropy idea. rann4 and a more recent implementation rannw [TT] 
are more than three times faster than the methods previously thought to be 
the most efficient on vector processors. 

5.1 Wallace's fastnorm algorithm 

Many uniform random number generators generate one or more new uniform 
variables from a set of previously-generated uniform variables. Wallace's 
idea is to apply the same principle to normal random number generators. 



Given a set of normally distributed random variables, we can generate a new 
set of normally distributed random variables by applying a linear transfor- 
mation that respects the "maximum entropy" constraint. This avoids the 
time-consuming conversion of uniform to normal variables that is required 
in conventional normal random number generators (see @. 

The key idea is: if x is an n- vector of independent, identically distributed 
A'^(0, 1) random variables xi, . . . ,Xn, and Q is any n x n orthogonal matrix, 
then y = Qx is another n- vector of independent, identically distributed 
A^(0, 1) random variables. (Of course, the components y, of y are dependent 
on the components Xj of x.) To prove the claim, observe that the component 
Xj has probability density {2/tt)~^''^ exp(— a;^/2), so the vector x has proba- 
bility density (2/7r)~"'^ exp(— r^/2), where r = \\x\\2- This density depends 
only on r, the distance of x from the origin. However, since Q is orthogonal, 

llylb = WQxh = \\x\\2 = r. 

Suppose that the n-vector x is a pool of n pseudo-random numbers that 
(we hope) are independent and normally distributed. We can generate a 
new pool y = Qx by applying an orthogonal transformation Q. However, 
several problems arise. 

5.2 Undesirable correlations 

yi is correlated with Xj. In fact, yi = QijXj + • • • , so E(yjXj) = E(gjjx|) = 
Qij. This problem can be overcome by applying several different orthogonal 
tranformations Qi,Q2t ■ ■ with a random choice of signs, so when averaged 
over all transformations F,{qij) ~ 0. 

5.3 Cost of transformations 

It is too expensive to apply a general nx n orthogonal transformation Q to 
produce n new random numbers. This would involve of order n multiplica- 
tions (and a similar number of additions) per random number generated. To 
overcome this problem, we can take Q to have a special form, e.g. in rann4 
we use a product of plane rotations of the form 
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cos 9 sin 9 
— sin 9 cos 9 



where 9 varies, but is held constant within each inner loop. We do not need 
to compute trigonometric functions, since sm9 = 2t/(l + t^) and cos 9 = 
(1 — t^)/(l + t^), where t = tan(6'/2) varies; the angle 9 is defined only for 
mathematical convenience and is never computed. 
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In his implementation fastnorm, Wallace preferred to use 4x4 orthogonal 
matrices ^1,^2,^3,^4, where 



2 



and A2, A3, ^4 are similar. The advantage (on a machine with slow floating- 
point multiplication) is that multiplication of a 4- vector by Ai requires only 
seven additions and one division by two (for details see [SSi §2.1]). 

The inner loop of the implementation is similar to the inner loop for the 
popular "generalised Fibonacci" uniform random number generators [3l El 
la m [211 [231 EB [33]. Wallace's implementation of fastnorm on a RISC 
workstation is about as fast as a good uniform random number generator 
on the same workstation. 

5.4 Mixing 

As Wallace observes p8l §2.2], it is desirable that any value in the pool 
should eventually contribute to every value in the pools formed after sev- 
eral passes. In other words, the transformation from one pool to the next 
should be strongly "mixing" . In our experience this is a tricky aspect of the 
implementation of generators based on Wallace's idea - several attempts 
which appeared plausible did not produce acceptable random numbers (af- 
ter transformation to uniform variates they failed various statistical tests in 
Marsaglia's Diehard package [24] )• 

In fastnorm, Wallace ensures mixing by regarding the pool of 1024 val- 
ues as a 256 x 4 matrix which is (implicitly) transposed at each pass; an 
additional ad hoc permutation is applied by stepping some row indices with 
an odd stride (mod 256). For details see [38l §2.2]. 

In rann4 we effectively apply permutations of the form 7ri(j) = aj + 
7 mod n, 7r2(j) = I3j + 6 mod n, where gcd(a,n) = gcd(/3,n) = 1. Since n 
is a power of 2, any odd a and (3 can be chosen. For details see [H §3]. 

Although the mixing transformations used in fastnorm and rann4 appear 
satisfactory, they seem ad hoc and there is little helpful theory here - all we 
can do is apply empirical tests. 

5.5 Chi-squared correction 

Because Q is orthogonal, ||Qx||2 = \\x\\2, so the sum of squares of numbers 
in a pool remains constant. This is unsatisfactory, because if xi, . . . , x„ were 



independent samples from the normal A^(0, 1) distribution, we would expect 
J^KiKn^l *° have a chi-squared distribution x^, where i^ = n is the pool 
size. 

To overcome this defect, Wallace suggests that one pseudo-random num- 
ber from each pool should not be returned to the user, but should be used to 
approximate a random sample S from the xt distribution. A scaling factor 
can be introduced to ensure that the sum of squares of the u values in the 
pool (of which v — 1 are returned to the user) is S. If the routine is written 
to provide random numbers with mean // and variance o"^, then scaling by 
S^'"^ can be done at the same time as scaling by a, so it is essentially free. 

There are several approximations to the Xu distribution for large u. For 
example, the one used in rann4 is 

2x1 - (x + V2^7^^)' , 

where x is A^(0, 1). It would not be much more expensive to use the (more 
accurate) Wilson-Hilferty approximation [43j 
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Even better is 



where 



xl ^ A{x^-l) + {2{i^ - A^)^/^ X + ly 

r- /I 1 \ 2 ^ /I 

A = 2\/u sin - arcsin --= ] = — h O — 
\3 yjv ) 3 V^ 

satisfies the cubic equation A"^ — 3vA+2i' = 0. We can assume that v is large 
[v = 1024 in fastnorm; v depends on the size of the buffer provided by the 
user in rann4/rannw) , so all of these approximations are sufficiently accu- 
rate. A slow but exact Xu algorithm, such as that of Ahrens and Dieter [2], 
is not required. 

In the above approximations to Xu^ the variable x was supposed to have 
a normal distribution. If only n — 1 values are returned to the user from a 
pool of n values, the remaining (scaled) value x can be used to approximate 
xl for the next pool. This is a point where the implementations of fastnorm 
and fastnorm2 differ. In fastnorm, x is taken from the current pool, but in 
fastnorm2 it is taken from the previous pool. The choice used in fastnorm 
is undesirable because a large value of x, and hence a large scaling factor 
from the xi approximation, is correlated with a small sum of squares of 
the remaining values in the pool (since the sum of squares including x is 
invariant). 



5.6 More subtle correlations 

In ^5.21 we saw how, by using several orthogonal transformations, we could 
ensure that E(yjXj) ~ 0. However, more subtle correlations persist. Con- 
sider the simplified model 
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where R{0) is a plane rotation as above, and 9 is distributed uniformly in 
[0,27r). We write c = cos 6, s = sinO. Thus yi = cxi + sx2, 2/2 = —sxi + cx2- 
Suppose that xi and X2 are independent and normally distributed, with zero 
mean and unit variance. Then 

Eixlvl) = E{c^)E{x\) + E{s^)E{xlxl) = 2 / E{xl)E{vl) . 

In fastnorm/fastnorm2 and rann^/rannw, similar effects occur, although the 
undesirable correlations are small and they occur between well-separated 
outputs (the separations are of the order of the pool size) because of the 
permutations used to provide mixing (cf §5.4p . 

5.7 Other finite pool size effects 



Chris Wallace [40] has observed a phenomenon that, like the one discussed 
in ^5.61 becomes less significant as the pool size increases, but never disap- 
pears entirely for any finite pool size n. 

Consider a rare event such as the occurrence of a large normal variate x 
that is expected to occur say once in every lOn samples, i.e. once in every 
10 pools. The "energy" rr^ is distributed over only a small number (four) of 
variables in the next pool. Thus we can expect one or more of these variables 
to be unusually large. Although the distribution of values considered over 
many pools is correct, it is more likely that rare events will occur in adjacent 
pools. 

It is possible to devise statistical tests that detect this behaviour and/or 
the correlations described in §5.61 However, we have not obtained any sta- 
tistically significant results with a sample size of less than lO^n. 

Clearly, one way to reduce (though not eliminate) the significance of 
such effects is to increase the pool size (easy for rann^/rannw). Another 
way is to discard some of the numbers produced by the random number 
generator - e.g. we could use every third value, or the values in every third 
pool. This has an obvious effect on the speed of the generator, but because 
the underlying algorithm is so fast we can afford to do it and still have a 



random number generator that is faster than more conventional generators 
(c/®. 

5.8 Use of uniform RNGs 

Although normal generators based on the maximum entropy idea do not 
use uniform random numbers in any essential way, it is convenient to use 
a uniform RNG for purposes such as initialisation, selection of orthogonal 
transformations, etc. The advantage of the maximum entropy methods is 
that the number U of uniform distributed numbers required per normally 
distributed number is very small (of the order of 1/n for pool size n), whereas 
for rejection methods U > 1. 

If we choose a uniform random number generator with known long pe- 
riod, and use it at least once for each pool of normal random numbers (e.g. 
to select from a set of possible orthogonal transformations), then it is easy 
to guarantee that the period of the normal random number generator is at 
least as great as that of the uniform random number generator. Thus, al- 
though any use of a uniform random number generator might be considered 
contrary to the spirit of the maximum entropy method, it does have the 
practical benefit of guaranteeing a long period. If (as is certainly possible) 
we avoided using a uniform generator except perhaps for initialisation, then 
we could not guarantee a long period, although a short period would be 
extremely unlikely, since it would require an implausible coincidence in the 
initialisation. 

5.9 Summary 

Although care needs to be taken in the implementation of normal random 
number generators like fastnorm, and the end-user should be aware of the 
small but unavoidable defects discussed in § §5. 6115. 7} these generators have 
such a performance advantage over more conventional generators that they 
can not be ignored in applications where the speed of generation of pseudo- 
random numbers is critical. 
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